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ABSTRACT 

Breast tissue density has been shown to be related to the risk of development of breast cancer, since dense breast 
tissue can hide lesions, causing the disease to be detected at later stages. Thus there is a need for the development of 
efficient techniques for the classification of breast density. In the proposed work, breast density (Fatty and Dense) is used 
as a pattern for classification. For carrying out the experiments mini-MIAS database has been used. This database contains 
images from screening mammography and has been widely used in the recent research. Texture features based on Grey 
Level Difference Statistics and Fourier Power Spectrum have been used for representing the texture pattern of fatty and 
dense mammograms. These features are then subsequently fed to the SVM classifier to classify fatty and dense 
mammograms. The results show that the extracted features perform very well with Polynomial Kernel of Support Vector 
Machine (SVM) classifier giving an accuracy of 97.25%. The experimental results encourage the use of proposed method 
for the classification of breast density. 
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INTRODUCTION 

In spite of good advancements for diagnosis and treatment, cancer is still a big threat to the society. 
Breast cancer is the most frequently diagnosed malignancy found in women [1-3]. According to the facts from the 
International Agency for Research on cancer, breast cancer comprises of 22.9% of invasive cancers affecting women. 
The data from the Population Based Cancer Registry (PBCR) also reveals that around 25-32% women living in metro 
cities of India are affected by this abnormality [4]. In Punjab state, around 4000 cases of various types of cancer have been 
reported during the last six months and most of the cases are of breast cancer. The worst affected are districts falling under 
the Malwa region of the Punjab. With widespread acceptance of mammography as a screening tool, there is a need to 
process efficiently such images using techniques of computer vision. Previous studies have shown that the sensitivity in 
detecting a breast cancer decreases due to increase in the breast density, as high density makes it difficult for the 
radiologists to see an abnormality which leads to false negative results [5-7]. Radiologists primarily estimate breast density 
by visual judgment of mammogram which is highly subjective. Automatic breast density classification methods attempt to 
mimic such visual judgment and classify images on the basis of underlying texture characteristics [8]. With widespread 
acceptance of mammography as a screening tool, there is a need to process images efficiently using techniques of computer 
vision. 
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This paper presents a scheme for the classification of fatty and dense mammograms based on texture features 
extracted using Grey level Difference Statistics (GLDS) and Fourier Power Spectrum (FPS) Model. The extracted features 
are finally fed to the Support Vector Machine (SVM) classifier for classification. The proposed methodology consists of 
various steps like Region of Interest Extraction, Texture Feature Extraction and Classification. 

The remaining paper is organized as follows: Section II describes the related work done in the area of breast 
density classification. Section III describes the material and methods. In section IV, results and discussions are presented. 
Finally conclusions are given in section V. 

LITERATURE REVIEW 

In the context of mammography and breast cancer, some works have explored the use of Computer Aided 
Diagnosis (CAD) system.Oliver et al. proposed a CAD system for classification of breast density using morphological and 
texture features [8], A set of 322 images from mini -MIAS database and 831 images from Digital Database for Screening 
Mammography (DDSM) database were used for evaluating the performance of the proposed system. For classification a 
Decision Tree classifier, Bayesian classifier and k-Nearest Neighbor were used. Bovis et al. proposed an approach for the 
classification of mammograms on the basis of breast density [9], A total of 377 mammograms from DDSM were selected 
to evaluate the performance. The authors investigated the use of Spatial Grey Level Dependency (SGLD) matrices, Fourier 
Power Spectrum (FPS), Law’s Texture Energy Measure, Discrete Wavelet Transform (DWT) based features for classifying 
mammograms. Subashini et al. proposed an automatic approach for assessing the breast tissue density [10], 
The mini -MIAS database was used to evaluate the performance. Various statistical features were extracted from the ROI to 
represent the texture. Support Vector Machine (SVM) was used as a classifier to classify the images. Tzikopoulos et al. 
presented an approach for automatic segmentation of breast and classification scheme for breast tissue density estimation 
[1 l].The proposed algorithm was tested on mini-MIAS database. From each image first-order statistical features and fractal 
features were extracted. Classification was done by SVM. Ibrahim et al. proposed an approach for the classification of 
breast masses [12]. The authors have extracted sixty one features for the classification purpose based on the proposed 
visual method. For classification they have used k-NN and Support Vector Machine. 

From the literature studied it has been found that one of the challenging aspects of CAD systems is to extract 
features from the images to represent efficiently their diagnostic and visual information content. There are many other 
issues to be considered in the design of a Computer Aided Diagnosis system that includes Region of Interest (ROI) 
extraction, feature extraction, selection of optimal features from the extracted features and classification. 

Although a lot of work has been done in the area of breast density classification, but still it is the subject of great 
importance and relevance due to increasing prevalence of breast cancer across the globe. 

MATERIALS AND METHODS 

• Mammogram Database Used 

For carrying out the proposed work mini-MIAS [13] database has been used. This standard database contains 322 
images in Medio Lateral-Oblique (MLO) view. The original MIAS database (digitized at 50 micron pixel edge) has been 
reduced to 200 micron pixel edge and clipped/padded so that every image is 1,024 x 1,024 pixels. The images in database 
have ground truth provided by the experienced radiologists that includes location of abnormality, radius of circle enclosing 
the abnormality, character of background tissue and severity of abnormality. The images in this database are classified in 
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three categories based on their density as fatty, fatty-glandular and dense glandular.In this study, all the Fatty-Glandular 
and Dense-Glandular mammograms are treated as one group of dense mammograms giving a two class classification 
problem (Fatty, Dense). Table 1 shows the Division of Breast Density Classes in mini-MIAS Database. 

Table 1: Division of Breast Density Classes in Mini-MIAS Database 


Class 

Character of 

No. of 

Background Tissue 

Images 

I 

Fatty (F) 

106 

II 

Fatty-glandular (G) 

104 

III 

Dense-glandular (D) 

112 

Total Images 

322 


Mammogram Du t ab use- 


I 


Region of Interest ( R OI ) Extraction 


I 


Texture Feature Extraction 

Using Grey Level Difference Statistics and 
Fourier Power Spectrum IVTodel 


I 


Glassification 

Using Support Vector Machine (S'VTVI) 


I 



Figure 1: Block Diagram of Proposed Methodology 


• Region of Interest and Texture Feature Extraction 

The block diagram of proposed methodology is given in figure 1. For carrying out the experiments Region of 
Interest of size 200x200 pixels were manually cropped from the centre of breast tissue immediately behind the nipple in 
such a way that ROI contains tissue pattern only excluding the pectoral muscle and background area [14-16]. The fatty 
mammogram visually differs from dense mammogram in terms of tonal variations (intensity-based like contrast, 
brightness) as high density looks brighter in the mammography. In the proposed work texture features are extracted using 
Grey Level Difference Statistics Model and Fourier Power Spectrum Model. 

• Grey Level Difference Statistics Features 

The GLDS algorithm uses first-order statistics of local property values based on the absolute differences between 
pairs of gray levels or of average gray levels to extract the following 5 texture measures: Homogeneity, Contrast, Mean, 
Energy and Entropy [17]. These features are based on the absolute difference between pairs of gray levels separated at 
distance S — ( Ax , Ay). For a given displacement S — (Ax, Ay), the difference image f s (x, y) is defined as: 

fs(x,y) = \f(x,y) -f{x + Ax,y + Ay)| (1) 

andpgis the probability density (gray-level histogram) of fs(x,y) for m gray levels. Various texture features 
extracted from p§ are: 
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• Homogeneity (HOMG): It is a measure of similarity in grey level intensities. 

HOMG = (2) 

l+i v 7 

• Contrast (CNTG): It is a measure of grey level intensity difference between neighboring pixels. 

CNTG =Zi 2 Ps<iO (3) 

• Mean (MENG): It is the average value of the grey level intensities within a given area. 

MENG = ^Zlp s (J0 (4) 

• Energy (ENGG): It represents amplitude of grey level values. 

ENGG = YPs(0 2 (5) 

• Entropy (ENTG): It measures the randomness in grey level intensities within the given area. 

ENTG = -£Ps(01°gPs(0 (6) 

• F ourier Power Spectrum F eatures 

This texture model contains the information on the texture orientation, grain size, and texture contrast of the 
image. The Discrete Fourier Transform (DFT) approach is used here for texture quantification because repetitive global 
patterns are difficult to describe with spatial techniques but relatively easy to represent with peaks in the spectrum [18], 
The Radial sum and the Angular sum of the DFT were computed to describe texture. FPS features are computed from the 
power spectrum in the frequency domain. 

\F(u, v)\ 2 — F(u, v)F*(u, v) (7) 

where, F(u, v) is the Fourier transform of the image and F*{u, v) is the complex conjugate of Fourier transform 
of the image. 

Spectral features are expressed in polar coordinates to yield a function S(r,6 ). For each direction 6, S(r, O') can 
be expressed as S g (r) and similarly for each frequency r, S(r, 6) can be expressed as S r (0). Analyzing S g (r) for a fixed 
value of 6 gives the behaviour of spectrum along a radial direction from the origin and is called wedge analysis whereas 
analyzing S r (0) for a fixed value of r gives the behavior of spectrum along a circle centered on the origin and is called ring 
analysis. A global interpretation is obtained by summing over discrete variables: 

S g =r e =oS d (r) (8) 

And 

S r =Z?=A(0) (9) 

Where, R () \s the radius of circle centered at origin. 

In this texture model, two features: S r and S 0 are calculated and these are measure of the orientation of the texture. 

• Classification of Breast Density 

For classifying the mammograms into fatty and dense classes. Support Vector Machine (SVM) [19] classifier has 
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been used. SVM classifier is widely used in the recent research [20-24]. SVM guides the construction of classifier with 
good degree of generalization i.e. it has capability of predicting the class of sample that was not used in the learning 
process. For binary classification, SVM can be described as follows: Given two classes and set of points that belongs to 
these classes, the SVM classifier determines the hyper-plane in the projected feature space that separates the points in order 
to place the highest number of points of the same class on the same side, while maximizing the distance of each class to 
that hyper-plane. In some cases, the dataset cannot be precisely separated by a hyper -plane, so a kernel function is used. In 
this work, three kernel functions have been used namely. Polynomial Kernel, RBF Kernel and Pearson VII function-based 
universal kernel for evaluating the performance. For classifying the mammograms in two classes WEKA data mining tool 
has been used. 

RESULTS & DISCUSSIONS 

The proposed work for classifying the mammograms into two categories based on their texture has been done on 
322 images of mini-MIAS dataset. The performance of the classifier is evaluated in terms of Sensitivity, Specificity, 
Accuracy and classification results are shown in Table 2. 


Table 2: Classification Performance using Different Kernel Functions 


Kernel Used: Polynomial 

Test 

Result 

Actual 

Sensitivity 

(%) 

Specificity 

(%) 

Accuracy 

(%) 

Positive 

Predictive 

Power 

Negative 

Predictive 

Power 

Misclassification 

Rate 

Dense 

(Positive) 

Fatty 

(Negative) 

Dense 

(Positive) 

75 (TP) 

3 (FP) 

too 

90.6 

97.25 

0.9625 

1 

0.0275 

Fatty 

(Negative) 

0 (FN) 

29 (TN) 

Kernel Used: RBF Kernel 

Test 

Result 

Actual 

Sensitivity 

(%) 

Specificity 

(%) 

Accuracy 

(%) 

Positive 

Predictive 

Power 

Negative 

Predictive 

Power 

Misclassification 

Rate 

Dense 

(Positive) 

Fatty 

(Negative) 

Dense 

(Positive) 

72 (TP) 

5 (FP) 

93.5 

84.4 

90.82 

0.9351 

0.8438 

0.0917 

Fatty 

(Negative) 

5 (FN) 

27 (TN) 

Kernel Used: Pearson VII Function-Based Universal Kernel. 

Test 

Result 

Actual 

Sensitivity 

(%) 

Specificity 

(%) 

Accuracy 

(%) 

Positive 

Predictive 

Power 

Negative 

Predictive 

Power 

Misclassification 

Rate 

Dense 

(Positive) 

Fatty 

(Negative) 

Dense 

(Positive) 

75 (TP) 

3 (FP) 

97.4 

90.6 

95.41 

0.9615 

0.9355 

0.0459 

Fatty 

(Negative) 

2 (FN) 

29 (TN) 


It is evident that the results obtained here are in exceptionally good agreement with existing approaches. 
These results demonstrate that the extracted features have significantly improved the performance of classification. 
From the experimental results it has been found that, the extracted features are when fed to the SVM classifier with 
polynomial kernel, gave maximum accuracy of 97.25% with 100% sensitivity and 90.6% specificity. A Sensitivity of 
100% means that all positives are being classified as positives i.e. dense mammograms are being recognized as dense that 
is highly desirable in medical field. With RBF kernel an overall accuracy of 90.82% has been achieved. Therefore this may 
not be a good option for classification. For Pearson VII function-based universal kernel, an accuracy of 95.41% has been 
achieved. Sensitivity and specificity are found to be 97.4% and 90.6% respectively. 

Experiments demonstrated that the proposed technique gives better results as compared to other approaches 
suggested by Mustra et al. [25] and Subashini et al. [10] those achieved an accuracy of 91.60% and 95.44% respectively. 
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This shows that the extracted features are best features in characterizing the texture pattern of fatty and dense 
mammograms. 

CONCLUSIONS 

In selecting effective features used in CAD systems for medical images, great research efforts have been focused 
on identifying and extracting the better features to capture the texture of images and improve correlation to the human 
visual similarity. In this paper an attempt has been made to classify mammograms on the basis of breast density using 
texture features. The classification accuracy has been tested on mini-MIAS database. The results provide compelling 
evidence that the Grey Level Difference Statistics features and Fourier Power Spectrum based features can be used for 
developing a CAD system for the classification of breast density. An accuracy of 97.25% with 100% sensitivity and 90.6% 
specificity has been achieved when the extracted features are fed to the SVM classifier with Polynomial Kernel. Thus these 
models can be explored for disease classification task and retrieval applications in CAD systems. 
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